Estimating Node Similarity by Sampling Streaming Bipartite Graphs
نویسندگان
چکیده
Bipartite graph data increasingly occurs as a stream of edges that represent transactions, e.g., purchases by retail customers. Applications such as recommender systems employ neighborhood-based measures of node similarity, such as the pairwise number of common neighbors (CN) and related metrics. While the number of node pairs that share neighbors is potentially enormous, in real-word graphs only a relatively small proportion of all pairs have a large number of common neighbors. This motivates finding a weighted sampling approach that preferentially samples such node pairs. This paper presents a new sampling algorithm that provides a fixed size unbiased estimate of the similarity (or projected) graph on a bipartite edge stream. The algorithm has two components. First, it maintains a reservoir of sampled bipartite edges with sampling weights that favor selection of high similarity nodes. Second, arriving edges generate a stream of similarity updates based on their adjacency with the current sample. These updates are aggregated in a second reservoir sample-based stream aggregator to yield the final unbiased estimate. Experiments on real world graphs show that a 10% sample ar each stages yields estimates of high similarity edges with weighted relative errors of about 10−2.
منابع مشابه
Sequential Importance Sampling for Bipartite Graphs With Applications to Likelihood-Based Inference
The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of test statistics. This paper builds on the work of Chen et al. (2005), providing an intuitive explanation of the s...
متن کاملUsing Random Walks to Generate Associations between Objects
Measuring similarities between objects based on their attributes has been an important problem in many disciplines. Object-attribute associations can be depicted as links on a bipartite graph. A similarity measure can be thought as a unipartite projection of this bipartite graph. The most widely used bipartite projection techniques make assumptions that are not often fulfilled in real life syst...
متن کاملBalanced Degree-Magic Labelings of Complete Bipartite Graphs under Binary Operations
A graph is called supermagic if there is a labeling of edges where the edges are labeled with consecutive distinct positive integers such that the sum of the labels of all edges incident with any vertex is constant. A graph G is called degree-magic if there is a labeling of the edges by integers 1, 2, ..., |E(G)| such that the sum of the labels of the edges incident with any vertex v is equal t...
متن کاملMaximum Matching in Semi-streaming with Few Passes
In the semi-streaming model, an algorithm receives a stream of edges of a graph in arbitrary order and uses a memory of size O(npolylogn), where n is the number of vertices of a graph. In this work, we present semi-streaming algorithms that perform one or two passes over the input stream for Maximum Matching with no restrictions on the input graph, and for the important special case of bipartit...
متن کاملnetworksis: A Package to Simulate Bipartite Graphs with Fixed Marginals Through Sequential Importance Sampling.
The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of graph statistics. This paper describes the networksis package for R and how its simulate and simulate_sis functio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1712.08685 شماره
صفحات -
تاریخ انتشار 2017